Motivation#

Main purpose: understand how well SF’s 911 medical call response system is performing and see where it can be improved.

Key Insights#

  • received_to_onscene: %64.83 within 10 (code 3)

  • received_to_onscene: %70.69 within 20 (code 2)

  • Time to onscene: roughly 80% of time is between responding to on scene.

  • 20% getting actually dispatching

  • Outer regions have a slightly longer total response time, statistically significant

Recommendations#

  • Investigate how ambulances are being dispatched: are they taking optimal routes? What are the biggest blockers to them getting to the scene?

  • Investigte what takes so long for available units to be dispatched once a call is received. Is there good communication between DEM and SFFD?

Table of contents#

Background#

The life cycle of a 911 call: source

According to the 2023 SF Annual Performance report, SFFD responds to 80-90% of 911 calls requiring an ambulance, and their goal is respond in under 10 minutes for code 3 calls (life-threatening) and under 20 minutes for code 2 calls (non-life-threatening). Their definition of their response time is the “Roll/Response” time (see above, source), which is only a portion of the lifecycle of a 911 call. However, if I’m the caller, I ultimately care about the entire duration between initating the call and when an ambulance arrives - the “Call Duration” time (see above). This is the metric that I analyze in this project.

Data sources#

SFFD and EMS Dispatched Calls for Service

Analysis neighborhoods

SFFD fire stations

Hospital Diversions

Call duration breakdown#

  • show joint plots

  • plots by hour of day, day of week

Exploring geographies for responding#

Exploring dispatching#

  • most aren’t late

  • diversions aren’t an issue (seemingly)

Data intro / cleaning#

  • show filters, and what percent of data this knocks out

  • show calculation of dates, percentages

high level#

import plotly.io as pio
pio.renderers.default = 'notebook'
from pylab import rcParams
rcParams['figure.figsize'] = 13, 6
import pandas as pd
raw = pd.read_parquet('raw_data_analysis/08122024-5yr.parquet')
print(f'raw shape: {raw.shape}')
df = raw[(raw.call_date.dt.year >= 2021)]
print(f'2021 onwards shape: {df.shape}')
raw shape: (1626986, 34)
2021 onwards shape: (1212862, 34)
df.loc[:,'on_scene_hour'] = df.on_scene_dttm.dt.hour
df.loc[:,'on_scene_day_of_week'] = df.on_scene_dttm.dt.dayofweek
from sklearn.preprocessing import StandardScaler
def add_norm_lat_long(df):
    lats, longs, slats, slongs = [], [], [], []
    for coord in df.case_location.values:
        if coord and 'coordinates' in coord:
            long, lat = coord['coordinates'][0], coord['coordinates'][1]
            slongs.append(long)
            slats.append(lat)
            longs.append(float(long))
            lats.append(float(lat))
        else:
            slongs.append(None)
            slats.append(None)
            longs.append(pd.NA)
            lats.append(pd.NA)
    df.loc[:,'lat'] = lats
    df.loc[:,'long'] = longs
    df.loc[:,'slat'] = slats
    df.loc[:,'slong'] = slongs
    

    df = df[df.lat.notnull() & df.long.notnull()]

    lat_long_scalar = StandardScaler()
    norm_lat_long = lat_long_scalar.fit_transform(df[['lat', 'long']])
    norm_lat_long_df = pd.DataFrame(norm_lat_long, columns=['norm_lat', 'norm_long'])
    norm_lat_long_df.index = df.index
    df_normed_lat_long = pd.concat([df, norm_lat_long_df], axis=1)
    
    return lat_long_scalar, df_normed_lat_long

lat_long_scalar, df = add_norm_lat_long(df)
df = df[df.lat.notnull() & df.long.notnull()]
/var/folders/h2/v02yh68x5vz3hbxw0mxbfpsw0000gn/T/ipykernel_86173/410091565.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

/var/folders/h2/v02yh68x5vz3hbxw0mxbfpsw0000gn/T/ipykernel_86173/410091565.py:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/h2/v02yh68x5vz3hbxw0mxbfpsw0000gn/T/ipykernel_86173/410091565.py:18: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

/var/folders/h2/v02yh68x5vz3hbxw0mxbfpsw0000gn/T/ipykernel_86173/410091565.py:19: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

/var/folders/h2/v02yh68x5vz3hbxw0mxbfpsw0000gn/T/ipykernel_86173/410091565.py:20: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/h2/v02yh68x5vz3hbxw0mxbfpsw0000gn/T/ipykernel_86173/410091565.py:21: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df.unit_type.value_counts(normalize=True)
unit_type
ENGINE            0.317043
MEDIC             0.304962
TRUCK             0.096494
PRIVATE           0.081911
CHIEF             0.071156
SUPPORT           0.044188
RESCUE CAPTAIN    0.036955
CP                0.025088
RESCUE SQUAD      0.013243
BLS               0.007993
INVESTIGATION     0.000877
AIRPORT           0.000089
Name: proportion, dtype: float64
df['received_to_dispatch'] = (df.dispatch_dttm - df.received_dttm).dt.total_seconds() / 60
df['dispatch_to_response'] = (df.response_dttm - df.dispatch_dttm).dt.total_seconds() / 60
df['response_to_onscene'] = (df.on_scene_dttm - df.response_dttm).dt.total_seconds() / 60
df['received_to_onscene'] = (df.on_scene_dttm - df.received_dttm).dt.total_seconds() / 60
df = df[(df.received_to_dispatch >= 0) & (df.dispatch_to_response >= 0) & (df.response_to_onscene >= 0)]
df.shape
(961275, 46)
import seaborn as sns
import matplotlib.pyplot as plt
tmp = df[df.unit_type == 'MEDIC']
tmp = tmp.loc[tmp.groupby('incident_number')['dispatch_dttm'].idxmin()]
ax = sns.boxplot(tmp, y='received_to_onscene', hue='final_priority', hue_order=['2', '3'], showfliers=False)

# Retrieve the lines that make up the boxplot
for i in range(2):
    # Get the whiskers and box information
    # Lines are ordered as: whisker_min, whisker_max, box_min, median, box_max
    whisker_low = ax.lines[i * 5].get_ydata()[1]
    whisker_high = ax.lines[i * 5 + 1].get_ydata()[1]
    q1 = ax.lines[i * 5].get_ydata()[0]
    median = ax.lines[i * 5 + 4].get_ydata()[0]
    q3 = ax.lines[i * 5 + 1].get_ydata()[0]

    # Get the center position for the label
    x_value = ax.lines[i * 5 + 3].get_xdata().mean()

    # Label each part with its value
    ax.text(x_value, median, f'{median:.2f}', ha='center', va='center', fontsize=10, fontweight='bold')
    ax.text(x_value, q1, f'{q1:.2f}', ha='center', va='center', fontsize=10, fontweight='bold')
    ax.text(x_value, q3, f'{q3:.2f}', ha='center', va='center', fontsize=10, fontweight='bold')
    ax.text(x_value, whisker_low, f'{whisker_low:.2f}', ha='center', va='center', fontsize=10, fontweight='bold')
    ax.text(x_value, whisker_high, f'{whisker_high:.2f}', ha='center', va='center',fontsize=10, fontweight='bold')


# Show the plot
plt.title('Time until first ambulance arrival (minutes)')
plt.ylabel('Minutes')
plt.show()
_images/1e8c31c08ad5195b80509fe5f8bcc24191a913f73b10d772c8e2fcd7bc9652e6.png
tmp = df[df.unit_type == 'MEDIC']
tmp = tmp.loc[tmp.groupby('incident_number')['dispatch_dttm'].idxmin()]
medic = tmp[(tmp.received_to_onscene >= 10) & (tmp.received_to_onscene < 120)]
print('% data filtered: {0:.2f}%'.format(len(medic) *100/ len(df[df.unit_type == 'MEDIC'])))
print(medic.shape)
medic.unit_id.nunique()
% data filtered: 54.91%
(172156, 46)
104
medic.final_priority.value_counts(normalize=True)
final_priority
2    0.665193
3    0.334807
Name: proportion, dtype: float64
fig, axs = plt.subplots(1, 2)
sns.histplot(medic, x='received_to_onscene', ax=axs[0], hue='final_priority')
axs[0].set_title('Raw on scene duration')
sns.histplot(medic, x='received_to_onscene', log_scale=True, ax=axs[1], hue='final_priority')
axs[1].set_title('log(on scene duration)')
Text(0.5, 1.0, 'log(on scene duration)')
_images/4cb576725b71035f8d3874f63637dd162e65924e50e5cd70bcaea935fa0542fd.png
import numpy as np
medic['received_to_dispatch_perc'] = medic.received_to_dispatch / medic.received_to_onscene
medic['dispatch_to_response_perc'] = medic.dispatch_to_response / medic.received_to_onscene
medic['response_to_onscene_perc'] = medic.response_to_onscene / medic.received_to_onscene
medic['received_to_dispatch_log'] = np.log(medic.received_to_dispatch)
medic['dispatch_to_response_log'] = np.log(medic.dispatch_to_response)
medic['response_to_onscene_log'] = np.log(medic.response_to_onscene)
medic['received_to_onscene_log'] = np.log(medic.received_to_onscene)
/var/folders/h2/v02yh68x5vz3hbxw0mxbfpsw0000gn/T/ipykernel_86173/642784944.py:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

/var/folders/h2/v02yh68x5vz3hbxw0mxbfpsw0000gn/T/ipykernel_86173/642784944.py:3: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

/var/folders/h2/v02yh68x5vz3hbxw0mxbfpsw0000gn/T/ipykernel_86173/642784944.py:4: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

/Users/domfj/Library/Caches/pypoetry/virtualenvs/sf-ff-service-KkBImDKy-py3.11/lib/python3.11/site-packages/pandas/core/arraylike.py:399: RuntimeWarning:

divide by zero encountered in log

/var/folders/h2/v02yh68x5vz3hbxw0mxbfpsw0000gn/T/ipykernel_86173/642784944.py:5: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

/Users/domfj/Library/Caches/pypoetry/virtualenvs/sf-ff-service-KkBImDKy-py3.11/lib/python3.11/site-packages/pandas/core/arraylike.py:399: RuntimeWarning:

divide by zero encountered in log

/var/folders/h2/v02yh68x5vz3hbxw0mxbfpsw0000gn/T/ipykernel_86173/642784944.py:6: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

/Users/domfj/Library/Caches/pypoetry/virtualenvs/sf-ff-service-KkBImDKy-py3.11/lib/python3.11/site-packages/pandas/core/arraylike.py:399: RuntimeWarning:

divide by zero encountered in log

/var/folders/h2/v02yh68x5vz3hbxw0mxbfpsw0000gn/T/ipykernel_86173/642784944.py:7: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

/var/folders/h2/v02yh68x5vz3hbxw0mxbfpsw0000gn/T/ipykernel_86173/642784944.py:8: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df.loc[:,'received_hour'] = df.received_dttm.dt.hour
medic.loc[:,'received_hour'] = medic.received_dttm.dt.hour
df.loc[:,'received_day_of_week'] = df.received_dttm.dt.dayofweek
medic.loc[:,'received_day_of_week'] = medic.received_dttm.dt.dayofweek
/var/folders/h2/v02yh68x5vz3hbxw0mxbfpsw0000gn/T/ipykernel_86173/4013571862.py:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

/var/folders/h2/v02yh68x5vz3hbxw0mxbfpsw0000gn/T/ipykernel_86173/4013571862.py:4: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
medic_onscene_hr = medic.groupby('received_hour').received_to_onscene.mean().to_frame().reset_index()
tmp = df.groupby('call_number').received_hour.first().to_frame().reset_index()
df_call_count = tmp.groupby('received_hour').size().to_frame(name='count').reset_index()

fig, ax1 = plt.subplots()

# Plot the first line plot
sns.lineplot(x='received_hour', y='received_to_onscene', data=medic_onscene_hr, ax=ax1, color='b',
             label='Mean response duration')

# Create a second y-axis with twinx
ax2 = ax1.twinx()

# Plot the second line plot on the second y-axis
sns.lineplot(x='received_hour', y='count', data=df_call_count, ax=ax2, color='r', label='# calls received')

lines_1, labels_1 = ax1.get_legend_handles_labels()
lines_2, labels_2 = ax2.get_legend_handles_labels()
ax1.legend(lines_1 + lines_2, labels_1 + labels_2, loc='upper left')
ax2.get_legend().remove()
# Add labels for both y-axes

ax1.set_ylabel('Ambulance response duration (min)', color='b')
ax2.set_ylabel('# calls received', color='r')
ax1.set_xlabel('Hour that call was received')
plt.title('Calls received and response times by hour of day')
# Show the plot
plt.show()
_images/a9618a738a4ad4d8ba46b1a06dd458c0e54ff9fb127fbacbd66f34bf5e0ecaa2.png
medic_onscene_day = medic.groupby('received_day_of_week').received_to_onscene.mean().to_frame().reset_index()
tmp = df.groupby('call_number').received_day_of_week.first().to_frame().reset_index()
df_call_count = tmp.groupby('received_day_of_week').size().to_frame(name='count').reset_index()

fig, ax1 = plt.subplots()

# Plot the first line plot
sns.lineplot(x='received_day_of_week', y='received_to_onscene', data=medic_onscene_day, ax=ax1, color='b',
             label='Mean response duration')

# Create a second y-axis with twinx
ax2 = ax1.twinx()

# Plot the second line plot on the second y-axis
sns.lineplot(x='received_day_of_week', y='count', data=df_call_count, ax=ax2, color='r', label='# calls received')

lines_1, labels_1 = ax1.get_legend_handles_labels()
lines_2, labels_2 = ax2.get_legend_handles_labels()
ax1.legend(lines_1 + lines_2, labels_1 + labels_2, loc='upper left')
ax2.get_legend().remove()
# Add labels for both y-axes

ax1.set_ylabel('Ambulance response duration (min)', color='b')
ax2.set_ylabel('# calls received', color='r')
ax1.set_xlabel('Day of week')
ax1.set_xticklabels(['unused','Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'])
plt.title('Calls received and response times by day of week')
# Show the plot
plt.show()
/var/folders/h2/v02yh68x5vz3hbxw0mxbfpsw0000gn/T/ipykernel_86173/1186794475.py:26: UserWarning:

set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.
_images/907d94dd5662d350880c4e74e3f47fa4ede5d525b1ddacfd5175764de6d10c57.png
sns.jointplot(medic.sample(frac=0.2, random_state=0),
            x='received_to_dispatch_perc', y='received_to_onscene_log',
           hue='final_priority', kind="kde", hue_order=['2', '3']);
plt.suptitle("% duration(received -> dispatch) vs log(response duration)");
plt.xlabel('% time spent to dispatch');
plt.ylabel('log (response duration)');
plt.tight_layout();
_images/299f06197fb63c811df0dbf9d585a7c7ba80b195ab72b17b23a70186e360e227.png
sns.jointplot(medic.sample(frac=0.2, random_state=0),
            x='dispatch_to_response_perc', y='received_to_onscene_log',
           kind='kde', hue='final_priority', hue_order=['2', '3']);
plt.suptitle("% duration(dispatch -> response) vs log(response duration)");
plt.xlabel('% duration(dispatch -> response)');
plt.ylabel('log (response duration)');
plt.tight_layout();
_images/16a7809334b57cabd891dc908402a0b24c453df6a371a0ed722fee9c211820ec.png
sns.jointplot(medic.sample(frac=0.2, random_state=0),
            x='response_to_onscene_perc', y='received_to_onscene_log',
           kind='kde', hue='final_priority', hue_order=['2', '3']);
plt.suptitle("% duration(response -> on scene) vs log(response duration)");
plt.xlabel('% duration(response -> onscene)');
plt.ylabel('log (response duration)');
plt.tight_layout();
_images/68b19f0c79217bff6c4fa1b7d6a29ab59a4ea9f49aacbd8d66f010819fa87e68.png
medic_sorted = medic.sort_values('received_dttm')
medic_sorted['prev_avail'] = None
for unit_id in medic_sorted.unit_id.unique():
    unit_filter = medic_sorted.unit_id == unit_id
    index_to_set = medic_sorted[unit_filter].iloc[1:].index
    prev_avail = medic_sorted[unit_filter].iloc[:-1].available_dttm
    prev_avail.index = index_to_set
    medic_sorted.loc[unit_filter, 'prev_avail'] = prev_avail
medic_sorted.prev_avail = pd.to_datetime(medic_sorted.prev_avail)
medic_sorted['prev_rest_min'] = (medic_sorted.received_dttm - medic_sorted.prev_avail).dt.total_seconds() / 60
tmp = medic_sorted.sample(frac=0.01, random_state=0)
sns.jointplot(tmp[(tmp.prev_rest_min < 2500) & (tmp.prev_rest_min > -50)],
            x='prev_rest_min', y='received_to_dispatch_perc', hue='final_priority', hue_order=['2', '3'],
             kind='kde')
<seaborn.axisgrid.JointGrid at 0x2ba3c2f10>
_images/42ca4c0751cc960861791fc5cb0491aee192e4424c756f5d0df7bdcd4afac624.png

Compare prev avail, see if long on scene duration time is an issue. Also worth looking into received to dispatch time? Else, try to locate problematic geographical areas to see if resource allocation is an issue. Also see how much of an improvement you need to improve KPI by.

sum(medic_sorted.prev_rest_min < 0) / len(medic_sorted)
medic_sorted['avail_when_received'] = medic_sorted.prev_rest_min > 0
sns.boxplot(medic_sorted, y='received_to_dispatch_perc', hue='avail_when_received');
plt.title("% duration(received -> dispatch) for busy and available ambulances");
plt.ylabel('% duration(received -> dispatch)');
_images/d256133f6aa1e0ee7e5fe2cbd3347d78c9f4b8f254ca38063a522ad5fb10fe33.png

.01% of data is kinda messed up: prev available is greater than response dttm lmao. Anyhow, conclusion is that when you’re not available when received (8% of the time), time to dispatch perc mean greater by 50%, with 75th quartile twice as large.

Also, so much outlier data for dispatch to received. maybe thats due to data error

NOw its time to see if travel distance is worse for certain areas

def dedup_calls(data, condition=None):
    if condition is not None:
        filtered = data[condition]
    else:
        filtered = data
    
    filtered.groupby(['slong', 'slat'])
    
    df = filtered.groupby(['slong', 'slat']).agg({
        'lat': 'mean',
        'long': 'mean',
    'norm_lat': ['mean','count'],
    'norm_long': 'mean',
        'received_to_onscene': 'mean'
    })
    df.columns = [' '.join(col).strip() for col in df.columns.values]
    df = df.rename(columns={'norm_lat mean': 'norm_lat', 'norm_lat count': 'count', 'norm_long mean': 'norm_long',
                           'lat mean': 'lat', 'long mean': 'long', 'received_to_onscene mean': 'received_to_onscene'})
    df['count'] *= df['received_to_onscene']
    return df
from sklearn import cluster
x = dedup_calls(medic_sorted)
print(x.shape)
km = cluster.KMeans(n_clusters=50, random_state=0)
preds = pd.Series(km.fit_predict(x[['norm_lat', 'norm_long']], sample_weight=x['count'])).to_frame(name='cluster')
preds.index = x.index
ax = sns.scatterplot(pd.concat([x, preds], axis=1), x='norm_long', y='norm_lat', hue='cluster', palette = sns.color_palette("hls", 15));
ax.get_legend().remove()
(10237, 6)
/var/folders/h2/v02yh68x5vz3hbxw0mxbfpsw0000gn/T/ipykernel_86173/2909112982.py:7: UserWarning:


The palette list has fewer values (15) than needed (50) and will cycle, which may produce an uninterpretable plot.
_images/c5062cd42608515440b3cfeb1c23d808b06d1566c7e203d0843f31208a6b7df4.png
medic_sorted_clustered = pd.merge(medic_sorted, preds, on=['slong', 'slat'])
assert len(medic_sorted) == len(medic_sorted_clustered)
assert medic_sorted.shape[1] + 1 == medic_sorted_clustered.shape[1]
cluster_metrics = medic_sorted_clustered.groupby('cluster').response_to_onscene.mean()
ordered_clusters = list((cluster_metrics).sort_values().index)
sns.catplot(medic_sorted_clustered, x='cluster', y='response_to_onscene', kind="box", log_scale=True,
           height=5, aspect=2, order=ordered_clusters)
<seaborn.axisgrid.FacetGrid at 0x2bbb0f810>
_images/47e4ad0fa4ae103d90a52e6ddd338e67bd33abd2f46e51566f2bd4c280915188.png
sns.scatterplot(medic_sorted_clustered[medic_sorted_clustered.cluster.isin(ordered_clusters[-20:])],
               x='norm_long', y='norm_lat', hue='cluster');
_images/25ada4d266616747451dc74193ea67425dafe7a4fb4983a7828584cd7a639f83.png
import geopandas as gpd
nhoods = gpd.read_file('raw_data_analysis/Analysis Neighborhoods.geojson')
stations = pd.read_csv('raw_data_analysis/stations.csv')
hospitals = pd.read_csv('raw_data_analysis/hospitals.csv')
import plotly.graph_objs as go
import numpy as np

def show_clusters_i(data, ordered_clusters):

    sample = data.copy().sample(frac=0.2, random_state=0)

    # Function to generate new data dynamically
    def get_lats(cluster):
        return sample[sample.cluster == cluster].lat

    def get_longs(cluster):
        return sample[sample.cluster == cluster].long

    # Create a separate trace for each cluster
    traces = []
    max_clusters = 20
    for cluster in list(reversed(ordered_clusters))[:max_clusters]:
        traces.append(go.Scatter(
            x=get_longs(cluster), 
            y=get_lats(cluster), 
            mode='markers',
            name=f'Cluster {cluster}',
            visible=False  # Initially, all traces are hidden
        ))

    for trace in traces[:10]:
        trace.visible = True

    def get_polygon_coords(geometry):
        """Extract x, y coordinates from Polygon or MultiPolygon."""
        if geometry.geom_type == 'Polygon':
            # For a single Polygon, get the exterior coordinates
            x, y = geometry.exterior.xy
            return [(list(x), list(y))]  # Return a list with one tuple of (x, y) coordinates
        
        elif geometry.geom_type == 'MultiPolygon':
            # For a MultiPolygon, get coordinates for each Polygon's exterior
            polygons_coords = []
            for polygon in geometry.geoms:  # Iterate over each Polygon in the MultiPolygon
                x, y = polygon.exterior.xy
                polygons_coords.append((list(x), list(y)))
            return polygons_coords
        
        return []  # In case there are no coordinates

    # Create scatter traces for each polygon
    polygon_traces = []
    for i, row in nhoods.iterrows():
        geometry = row['geometry']
        nhood_name = row['nhood']
        
        coords = get_polygon_coords(geometry)
        
        if geometry.geom_type == 'Polygon':
            # For simple polygons, add one trace
            x, y = coords
            polygon_traces.append(go.Scatter(
                x=x,
                y=y,
                name=nhood_name,
                line=dict(color='black'),
                    opacity=0.5
            ))
        elif geometry.geom_type == 'MultiPolygon':
            # For multipolygons, add multiple traces
            for x, y in coords:
                polygon_traces.append(go.Scatter(
                    x=x,
                    y=y,
                    name=nhood_name,
                    line=dict(color='black'),
                    opacity=0.5
                ))

    # Create steps for the slider, which will control the visibility of the clusters
    steps = []
    for i in range(max_clusters):  # Now the slider will go from 1 to 20 clusters
        step = dict(
            method='update',
            args=[{'visible': [False] * len(traces + polygon_traces)},  # Hide all traces initially
                {'title': f'Top {i+1} areas with largest mean response duration'}],  
            label=f'{i + 1}'
        )
        step['args'][0]['visible'][:i+1] = [True] * (i+1)
        step['args'][0]['visible'][max_clusters:] = [True] * len(polygon_traces)
        steps.append(step)

    # Create the slider
    sliders = [dict(
        active=9,
        currentvalue={"prefix": "Number of clusters: "},
        pad={"t": 50},
        steps=steps
    )]

    # Layout with a slider and buttons
    layout = go.Layout(
        height=800,
        title="Top 10 areas with largest mean response duration",
        sliders=sliders,
        xaxis=dict(
            title='longitude',
            range=[-122.52, -122.32],
            ),
        yaxis=dict(
            title='latitude',
            range=[37.7, 37.85],
            ),
        showlegend=False
    )

    # # Create the figure with initial scatter and polygons
    fig = go.Figure(layout=layout)
    for trace in traces:
        fig.add_trace(trace)
    for trace in polygon_traces:
        fig.add_trace(trace)
    
    # # Export to HTML (for use in Jupyter Book or web embedding)
    # fig.write_html("scatter_dynamic_polygons.html")

    # Display the figure (in case you're running this interactively)
    fig.show()
show_clusters_i(medic_sorted_clustered, ordered_clusters)

conclusion: inner clusters have worse mean time from response to on scene. could be data quality, but anyhow the data points to driving time being the biggest factor contributing to longer response times. its not units being so preoccupied. which means perhaps better resource allocation, or procedures for getting there, or just cleaner roads so that units can drive faster, would help, and especially in the inner clusters.

When not weighting for frequency of calls, its the outer clusters.

from sklearn import cluster

def display_clusters_by_onscene_time(data, condition):
    data = data.copy()
    data = data[condition]
    x = dedup_calls(data)
    km = cluster.KMeans(n_clusters=50, random_state=0)
    preds = pd.Series(km.fit_predict(x[['norm_lat', 'norm_long']], sample_weight=x['count'])).to_frame(name='cluster')
    preds.index = x.index
    data_clustered = pd.merge(data, preds, on=['slong', 'slat'])

    cluster_metrics = data_clustered.groupby('cluster').response_to_onscene.mean()
    ordered_clusters = list(cluster_metrics.sort_values().index)
    
#     sns.catplot(data_clustered, x='cluster', y='response_to_onscene', kind="box", log_scale=True,
#            height=5, aspect=2, order=ordered_clusters)

    show_clusters_i(data_clustered, ordered_clusters)
display_clusters_by_onscene_time(medic_sorted, medic_sorted.final_priority == '3')
display_clusters_by_onscene_time(medic_sorted, medic_sorted.final_priority == '2')
from scipy.stats import permutation_test

def run_permutation_test(data, condition):
    data = data.copy()
    data = data[condition]
    x = dedup_calls(data)
    km = cluster.KMeans(n_clusters=50, random_state=0)
    preds = pd.Series(km.fit_predict(x[['norm_lat', 'norm_long']], sample_weight=x['count'])).to_frame(name='cluster')
    preds.index = x.index
    data_clustered = pd.merge(data, preds, on=['slong', 'slat'])

    cluster_metrics = data_clustered.groupby('cluster').response_to_onscene.mean()
    ordered_clusters = list(cluster_metrics.sort_values().index)
    
    data_clustered['worst_clusters'] = data_clustered.cluster.isin(ordered_clusters[-20:])
    
    sampled = data_clustered.sample(frac=0.1, random_state=0)
    group1 = sampled[sampled.worst_clusters].response_to_onscene
    group2 = sampled[~sampled.worst_clusters].response_to_onscene

    # Run the permutation test
    result = permutation_test((group1, group2), 
                              statistic=lambda x, y: np.mean(x) - np.mean(y), 
                              n_resamples=10000,  # Monte Carlo approximation with 10,000 resamples
                              alternative='greater')
    print(result)
# run_permutation_test(medic_sorted, medic_sorted.final_priority == '2')
# run_permutation_test(medic_sorted, medic_sorted.final_priority == '3')

Back to investigating what’s correlated with prev_rest_min < 0

medic_sorted.loc[:, 'has_transport'] = medic_sorted.transport_dttm.notnull()
medic_sorted['prev_has_transport'] = None
for unit_id in medic_sorted.unit_id.unique():
    unit_filter = medic_sorted.unit_id == unit_id
    index_to_set = medic_sorted[unit_filter].iloc[1:].index
    prev_has_transport = medic_sorted[unit_filter].iloc[:-1].has_transport
    prev_has_transport.index = index_to_set
    medic_sorted.loc[unit_filter, 'prev_has_transport'] = prev_has_transport
tmp = medic_sorted.sample(frac=0.2, random_state=0)
sns.jointplot(tmp[(tmp.prev_rest_min < 2500) & (tmp.prev_rest_min > -50)],
            x='prev_rest_min', y='received_to_dispatch_perc', hue='prev_has_transport',
             kind='kde')
<seaborn.axisgrid.JointGrid at 0x12e7f7bd0>
_images/7c8b9281d8de4c18b354888edab20cea7460c4f83edd2e6eef8018a74b921c17.png

investgating diversions#

# %load_ext autoreload
# %autoreload 2
# from utils import get_hospital_diversions
diversions = pd.read_parquet('raw_data_analysis/hospital_diversions.parquet')
diversions.head()
hospital_full hospital_abbrev diversion_category started ended duration_in_minutes
0 California Pacific Medical Center - Van Ness CPMC - Van Ness ED 2024-04-30 14:35:00 2024-04-30 14:44:00 9.0
1 California Pacific Medical Center - Van Ness CPMC - Van Ness ED 2024-04-30 20:31:00 2024-04-30 22:27:00 116.0
2 California Pacific Medical Center - Van Ness CPMC - Van Ness ED 2024-05-01 09:18:00 2024-05-01 11:18:00 120.0
3 California Pacific Medical Center - Van Ness CPMC - Van Ness ED 2024-05-01 11:21:00 2024-05-01 11:28:00 7.0
4 California Pacific Medical Center - Van Ness CPMC - Van Ness ED 2024-05-01 15:27:00 2024-05-01 17:27:00 120.0
diversions.shape
(12181, 6)
diversions.duration_in_minutes = diversions.duration_in_minutes.astype(float)
diversions.duration_in_minutes.plot.hist(bins=100)
<Axes: ylabel='Frequency'>
_images/7a4916bb709e0469dac83232fd8b095a6803b74f9b9713d43984746c876618cb.png
diversions[diversions.duration_in_minutes < 60]
hospital_full hospital_abbrev diversion_category started ended duration_in_minutes
0 California Pacific Medical Center - Van Ness CPMC - Van Ness ED 2024-04-30 14:35:00 2024-04-30 14:44:00 9.0
3 California Pacific Medical Center - Van Ness CPMC - Van Ness ED 2024-05-01 11:21:00 2024-05-01 11:28:00 7.0
5 California Pacific Medical Center - Van Ness CPMC - Van Ness ED 2024-05-01 17:29:00 2024-05-01 18:22:00 53.0
6 California Pacific Medical Center - Van Ness CPMC - Van Ness ED 2024-05-01 22:21:00 2024-05-01 22:32:00 11.0
7 California Pacific Medical Center - Van Ness CPMC - Van Ness ED 2024-05-02 02:32:00 2024-05-02 02:44:00 12.0
... ... ... ... ... ... ...
12174 Chinese Hospital Chinese ED 2024-03-16 12:25:00 2024-03-16 12:27:00 2.0
12175 Chinese Hospital Chinese ED 2024-03-18 10:15:00 2024-03-18 10:17:00 2.0
12177 Chinese Hospital Chinese ED 2024-03-19 16:01:00 2024-03-19 16:03:00 2.0
12178 Chinese Hospital Chinese ED 2024-03-29 12:48:00 2024-03-29 13:43:00 55.0
12179 Chinese Hospital Chinese ED 2024-04-02 11:40:00 2024-04-02 11:42:00 2.0

5000 rows × 6 columns

# Define the range of hours you want to analyze
all_periods = pd.date_range(start=diversions['started'].min().floor('30T'), 
                          end=diversions['ended'].max().ceil('30T'), freq='30T')

# Create a DataFrame to store the number of hospitals in diversion each hour
diversion_counts = pd.DataFrame({'period': all_periods})

# For each hour, count the number of hospitals that are in diversion
diversion_counts['diversion_count'] = diversion_counts['period'].apply(
    lambda h: np.sum((diversions['started'] <= h) & (diversions['ended'] > h))
)
/var/folders/h2/v02yh68x5vz3hbxw0mxbfpsw0000gn/T/ipykernel_86173/925204955.py:2: FutureWarning:

'T' is deprecated and will be removed in a future version, please use 'min' instead.

/var/folders/h2/v02yh68x5vz3hbxw0mxbfpsw0000gn/T/ipykernel_86173/925204955.py:3: FutureWarning:

'T' is deprecated and will be removed in a future version, please use 'min' instead.
diversion_counts.head()
period diversion_count
0 2023-01-01 02:00:00 0
1 2023-01-01 02:30:00 1
2 2023-01-01 03:00:00 1
3 2023-01-01 03:30:00 1
4 2023-01-01 04:00:00 1
medic_sorted['received_30T'] = medic_sorted.received_dttm.dt.floor('30min')
medic_sorted['received_30T_prev'] = medic_sorted.received_30T - pd.Timedelta(minutes=30)
medic_sorted_with_diversions = medic_sorted.merge(diversion_counts, how='inner',
                                        left_on='received_30T', right_on='period')
medic_sorted_with_diversions = medic_sorted_with_diversions.merge(diversion_counts, how='inner',
                                        left_on='received_30T_prev', right_on='period',
                                        suffixes=('_curr', '_prev'))
medic_sorted_with_diversions.head()
id call_number unit_id incident_number call_type call_date watch_date received_dttm entry_dttm dispatch_dttm ... prev_rest_min avail_when_received has_transport prev_has_transport received_30T received_30T_prev period_curr diversion_count_curr period_prev diversion_count_prev
0 230010486-66 230010486 66 23000085 Medical Incident 2023-01-01 2022-12-31 2023-01-01 02:35:05 2023-01-01 02:35:05 2023-01-01 03:04:46 ... -15.400000 False False True 2023-01-01 02:30:00 2023-01-01 02:00:00 2023-01-01 02:30:00 1 2023-01-01 02:00:00 0
1 230010507-60 230010507 60 23000091 Medical Incident 2023-01-01 2022-12-31 2023-01-01 02:36:52 2023-01-01 02:40:11 2023-01-01 02:40:33 ... 147.883333 True True True 2023-01-01 02:30:00 2023-01-01 02:00:00 2023-01-01 02:30:00 1 2023-01-01 02:00:00 0
2 230010499-68 230010499 68 23000088 Medical Incident 2023-01-01 2022-12-31 2023-01-01 02:38:43 2023-01-01 02:38:43 2023-01-01 02:55:25 ... -16.633333 False False True 2023-01-01 02:30:00 2023-01-01 02:00:00 2023-01-01 02:30:00 1 2023-01-01 02:00:00 0
3 230010515-77 230010515 77 23000093 Medical Incident 2023-01-01 2022-12-31 2023-01-01 02:39:25 2023-01-01 02:42:10 2023-01-01 02:43:38 ... 6.616667 True False True 2023-01-01 02:30:00 2023-01-01 02:00:00 2023-01-01 02:30:00 1 2023-01-01 02:00:00 0
4 230010525-64 230010525 64 23000094 Medical Incident 2023-01-01 2022-12-31 2023-01-01 02:43:16 2023-01-01 02:45:28 2023-01-01 02:57:10 ... 13.150000 True True True 2023-01-01 02:30:00 2023-01-01 02:00:00 2023-01-01 02:30:00 1 2023-01-01 02:00:00 0

5 rows × 66 columns

sns.jointplot(medic_sorted_with_diversions.sample(frac=0.2, random_state=0), x='diversion_count_curr',
             y='received_to_dispatch_perc', kind='kde')
<seaborn.axisgrid.JointGrid at 0x2bb922d50>
_images/0f099b3e9090b2cc4f2f27d7c61a533fd9de993b158230049cdadad0d65f2093.png
sns.jointplot(medic_sorted_with_diversions.sample(frac=0.2, random_state=0), x='diversion_count_prev',
             y='received_to_dispatch_perc', kind='kde')
<seaborn.axisgrid.JointGrid at 0x2bba2dd50>
_images/2b649dbea70f081305d8c0088cad445fb8a808b98a8c6a15b7bb4f13723d3329.png
sns.jointplot(medic_sorted_with_diversions.sample(frac=0.2, random_state=0), x='diversion_count_prev',
             y='received_to_onscene_log', kind='kde')
<seaborn.axisgrid.JointGrid at 0x2c6e7fcd0>
_images/2ec3ee4be79ccdb10ad6a8bfc9c69f354cf6fbf2a9137a8a0e806e8bcff91661.png